Search Results: "Robert Collins"

6 February 2010

Robert Collins: Adding new languages to Ubuntu

Scott recently noted that we don t have Klingon available in Ubuntu. Klingon is available in ISO 639, so adding it should be straight forward. Last time I blogged about this three packages needed changing, as well as Launchpad needing a translation team for the language. The situation is a little better now: only two packages need changing as gdm now dynamically looks for languages based on installed locales. libx11 still needs changing a minimal diff would be:

=== modified file 'nls/compose.dir.pre'
--- libx11-1.2.1/nls/compose.dir.pre
+++ libx11-1.2.1/nls/compose.dir.pre
@@ -406,0 +406,1 @@
+en_US.UTF-8/Compose:     tlh_GB.UTF-8
=== modified file 'nls/locale.alias.pre'
--- libx11-1.2.1/nls/locale.alias.pre
+++ libx11-1.2.1/nls/locale.alias.pre
@@ -1083,0 +1083,1 @@
+tlh_GB.utf8:           tlh_GB.UTF-8
 === modified file 'nls/locale.dir.pre'
--- libx11-1.2.1/nls/locale.dir.pre
+++ libx11-1.2.1/nls/locale.dir.pre
@@ -429,0 +429,1 @@
+en_US.UTF-8/XLC_LOCALE:       tlh_GB.UTF-8

Secondly, langpack-locales has to change for two reasons. Firstly a locale definition has to be added (and locales define a place a language and locale information like days of the week, phone number formatting etc. Secondly the language needs to be added to the SUPPORTED list in that package, so that language packs are generated from Launchpad translations. Now, gdm autodetects, but it turns out that only complete locales were being shown. And that on Ubuntu, this was not looking at language pack directories, rather at

/usr/share/locale

which langpack-built packages do not install translations into. So it could be a bit random about whether a language shows up in gdm. Martin Pitt has kindly turned on the with-incomplete-locales configure flag to gdm, and this will permit less completely translated locales to show up (when their langpack is installed without the langpack nothing will show up).

24 January 2010

Robert Collins: lCA 2010 Friday

Tridge on Patent defence for open source projects . Watch it! Some key elements:

prior art defence is very very hard non infringement is a much better defense because you only need to show you don t do the independent claims.
Reading a patent doesn t really harm us because triple damages is no less fatal than single damages Reading patents to avoid them is a good idea.
Dealing with patents is very technical. It needs training (and the talk has that training)
Patents are hard to read.
Claims are often interpreted much more specifically than engineers expect.
Best prior art is our own source code, with VCS date records and the exact date often matters because of the riority date.
Invalidation: dead patents are vampires and when they come back they are harder to kill again. Read the file wrapper audit log containing all correspondence <-> patent office and applicant.
Patents are not code: judges can vary the meaning.
Claim charts are what you use to talk to patent lawyers.
Build workarounds *and publish them*. Encourage others to adopt them.

21 January 2010

Robert Collins: LCA 2010 Friday keynote/lightning talks

Nathan Torkington on 3 lightning keynotes: 1) Lessons learnt! Technology solves problems no it doesn t, its all about the meatsacks! If you live a good life you ll never have to care about marketing steer the meatsacks English is an imperative language for controlling meatsacks. Tell the smart meatsacks what you want (english is declarative). 2) Open source in New Zealand: A bit of a satire

Sheep calculator , tatoos as circuit diagrams. The reserve bank apparently has a *working* water-economy-simulator. Shades of Terry Pratchett! 3) Predictions more satire about folk that make predictions financial analysts, science journalists. After that, it was lightning talk time. I ve just grabbed some highlights. Selena Deckelmann talked about going to Ondo in Nigeria and un-rigging an election:

Run for political office.
Lose but polls had suggested the reverse result
Don t give up protest file May 14 2007
Use technology fingerprint scanning 84814 duplicate fingerprints, 360 exactly the same fingerprints
Patience 2 years and the courts reversed the election

http://flossmanuals.net nice friendly manuals in many languages writen at book sprints. Kate Olliver presented on making an origami penguin. Mark Osbourne presented Open Source School a school in New Zealand that has gone completely open source, even though the NZ school system pays microsoft 10Million/year for a country wide license.

Robert Collins: LCA 2010 Thursday

Jeremy Allison on The elephant in the room free software and microsoft . While he works at Google, this talk was off the leash not about Google

. As usual grab the video

We should care about Microsoft because Microsoft s business model depends on a monopoly [the desktop]. Microsoft are very interested in Open Source Apache, MIT, BSD licenced software the GPL is intolerable. Jeremy models Microsoft as a collection of warring tribes that hate each other e.g. Word vs Excel. The first attack was on protocols make the protocols more complex and sophisticated. MS have done this on Kerberos, DCE/RPC, HTTP, and higher up the stack via MSIE rendering modes, ActiveX plugins, Silverlight The EU case was brought about this in the Workgroup Server Market . MS were fined 1 Billion Euros and forced to document their proprietary protocols. OOXML showed up rampant corruption in the ISO Standards process but it got through even though it was a battle against nearly everyone! On the good side it resulted into an investigation into MS dominance in file formats -> MS implemented ODF and MS have had to document their old formats. MS have an ongoing battle in the world wide web IE / Firefox, ajax applications/ silverlight. All of these things are long term failures for MS so what next? Patents

. Patents are GPL incompatible, but fine with BSD/MIT. The Tom Tom is the first direct attack using MS s patent portfolio. This undermines all the outreach work done by the MS Open Source team which Jeremy tells us are true believers in open source, trying to change MS from the inside. Look for MS pushing RAND patented standards: such things lock us out. Netbooks are identified as a key point for MS to fight on lose that and the desktop position is massively weakened. We should:

Keep creating free software and content *under a copyleft license*.
Keep pressure on Governments and organisations to adopt open standards and investigate monopolies.
Lobby against software patents.
Search for prior art on relevant patents and destroy them.
Working for a corporation is a moral choice: respectfully call out MS employees.

Jonathan Oxer spoke about the google Moon X-prize and the lunarnumbat.org project it needs contributors: software and hardware hackers, arduino/beagleboard/[M]JPEG2000 gooks, code testers and reviewers, web coding, documentation, math heads & RF hackers. Sounds like fun now to find time! Paul McKenney did another RCU talk and as always it was interesting Optimisation Gone Bad (RCU in Linux 1993-2008). Linux 2.6 -rt patch made RCU much much much more complex with atomic operations, memory barriers, frequent cache misses, and since then it was slowly being whittled back, but there is now a new simpler RCU based around the concept of doing the accounting during context switches & tracking running tasks.

20 January 2010

Robert Collins: LCA 2010 Thursday Keynote Glyn Moody

Glyn Moody Hackers at the end of the world. Rebel code is now 10 years old 50+ interviews over a year and could be considered an archaeology now

I probably haven t down the keynote justice it was excellent but high density you should watch it online

Glyn talks about open access various examples like the public library of science (and how the scientific magazine business made 30%-40% profit margins. The Human Genome Project & the Bermuda Principles : public submimssion of annotated sequences. In 2000 Celera were going to patent the entire human genome. Jim Kent spent 3 weeks writing a program to join together the sequenced fragments on a 100 PC 800Mhz Pentium processor. This was put into the public domain on just before Celera completed their processing and by that action Celera were prevented from patenting *us*. Openness as a concept is increasing within the scientific community open access to result, open data, open science (the full process). An interesting aspect to it is open notebook science daily writeups, not peer reviewed: release early, release often for science. Amazingly, Project Gutenberg started in 1971! Glyn ties together the scientific culture (all science is open to some degree) and artistic culture (artists share and build on /reference each others work) by talking about a lag between free software and free content worlds. In 1999 Larry Lessig setup Copyright s Commons built around an idea of counter-copyright copyleft for non-code. This didn t really fly, and Creative Commons was setup 2 years later. Wikipedia and newer sharing concepts like twitter/facebook etc are discussed. But what about the real world: transparency and governments, or companies? They are opening up. However, data release != control release. And there are challenges we all need to face:

GFinancialC my gain is your loss . Very opaque system.
GEnvironmentalC my gain is our loss

Glyn argues we need a different approach to economic governance: the commons. 2009 Nobel laureate for Economic Sciences Elinor Ostrom work on commons and their management via user associations which is what we do in open source! Awesome!

Robert Collins: LCA 2010 Wednesday

Pandora-build. There for support I ve contributed patches. Pandora is a set of additional glue and layers to improve autotools and make it easier to work with things like gettext and gnulib, turn on better build flags and so forth. If you re using autotools its well worth watching this talk or hop on #drizzle and chat to mtaylor

The open source database survey talk from Selena was really interesting a useful way of categorising databases and a list of what db s turned up in what category. E.g. high availability,community development model etc. Key takeaway: there is no one-true-db. I gave my subunit talk in the early afternoon, reasonably well received I think, though I wish I had been less sick last week: I would have loved to have made the talk more polished. Ceph seems to be coming along gangbusters. Really think it would be great to use for our bzr hosting backend. 0.19 will stablise the disk format! However we might not be willing to risk btrfs yet

Next up, the worst inventions ever.. catch it live if you can!

19 January 2010

Robert Collins: LCA2010 Wednesday Keynote

Another must-grab-the-video talk : Mako s keynote. Antifeatures, principles vs pragmatism do come together. The principled side RMS & the FSF important to control ones technology because its important to control ones life. The pragmatic side quality, no vendor lock etc. False dichotomy.. freedom imparts pragmatic benefits even though it doesn t intrinsically import quality, good design: 95% of projects 5 contributors; median number of contributors 1, and such small collaborations are no different than a closed source one. Definition of antifeatures built functionality to make a product do something one does not want it to do. Great example of phone books: spammers pay for access to the lists, and thus we have to pay *not to be listed*, but its actually harder to list and print our numbers in the first place. Mako makes a lovely analogy to the mafia there. Similarly with Sony charging 50 dollars not to install trialware on windows laptops in the past. Cameras: Canon cameras disabled RAW saving . CHDK, an open source addon for the camera outputs RAW again. Panasonic are locking down their cameras to reject third party batteries. The tivo is an example of how focusing on licensing can miss the big picture: free stack, but still locked into a subscription to get an ongoing revenue stream. Dongles! Mako claimed there wasn t a facebook appreciation group for dongles there is. Github: paying for the billing model lots of code there to figure out how many projects in a repo, so that they can charge on that basis. DRM is the mother of all antifeatures 10K people writing DRM code that no users want!

Robert Collins: LCA 2010 Tuesday

Gabriella Colemans keynote was really good; grab it from the videos once they come online. WETA run Ubuntu for their render farm: 3700 machines, 35000 cores, 7kw per cold rack and 22kw per hot rack. (Hot racks are rendering, cold racks are storage). Wow. Another talk well worth watching if you are at all interested in the issues related to running large numbers of very active machines in a small space. And a classic thing from the samba4 talk at the start of the afternoon: MS AD domain controllers do no validation of updates from other domain controllers: classic crunchy surface security. (Discovered by samba4 borking AD while testing r/w replica mode). Blue-ray on linux is getting there, however one sad thing is that the Blue ray standard has no requirement that vendors make players be able to play un-encrypted content and there are some hints that in fact licences may require them to not play un-encrypted content. Peter Chubb s talk on Articulate was excellent for music geeks: midi that sounds like music from lillypond. Ben Balbo talked about Roll your own dropbox . Ben works at a multimedia agency, but the staff work locally and don t use the file server . use instant messenger to send files around! Tried using subversion too hard. Dropbox looked good but 3-7 hundred a month too pricey given an existing 1.4TB of spare capacity. He then considered svn + cron but deleted directories cause havoc & something automatic was wanted so git + cron instead. Key thing used in doing this was having a work area with absolutely no metadata. Conflicts dealt with by filename.conflict.DATESTAMP.HOSTNAME.origextention Doesn t trigger of inotify, no status bar widget, only single user etc at the moment, but was written to meet the office needs so is sufficient. Interestingly he hadn t looked at e.g. iFolder.

10 January 2010

Robert Collins: Announcing testrepository

For a while now I ve been using subunit as part of my regular development workflow. I would pipe test results to a file, use subunit to report on failures from that file, and be able to inspect all the failures at my leisure without rerunning tests or copy and pasting from far back in my history. However this is a bit adhoc, and its not trivial to get good pipelines together while its not hard, its not obvious either. And commands like tee are less readily available for Windows users. So during my holidays I started a small project to automate this workflow. I didn t get all that much done due to a combination of travel and coming down with a nasty bug near the end of my holidays which I m now recovering from. Yay health returning + medicines. If only we had medichines

. However, I managed to get a reasonable first release out the door this evening. Grab it from launchpad or pypi. Testrepository has a few deps all listed in INSTALL.txt. Folk on Ubuntu Lucid should be able to just apt-get them all (sudo apt-get install subunit will be enough to run testrepository). If you re not on Lucid you can grab the debs manually, or use the subunit ppa (sudo add-apt-repository ppa:subunit), though I ve noticed just today that that karmic subunit build there only works with python 2.5, not the default of 2.6 I will fix that at some point. Using Testrepository is easy if you are developing python code:

$ testr init
$ python -m subunit.run project.tests.test_suite   testr load
id: 0 tests: 114

This will report any failures that occur. To see them again:

$ testr last
id: 0 tests: 114

The actual subunit streams are stored in .testrepository in sequentially numbered files (for now at least). So its very easy to get at them (for instance, subunit-stats < .testrepository/12). If you are not using python, you can still use subunit easily if you are using shunit, check or cppunit . subunit ships with bindings for shunit and cppunit, and check uses libsubunit with the CK_SUBUNIT output mode. TAP users can use tap2subunit to get a subunit stream from a TAP based testsuite. It s still early days but I m finding this much nicer than the adhoc subunit management I was doing before.

3 January 2010

Robert Collins: More evolution-reliability-speed

Evolution recently moved to a sqlite summary db rather than a custom summary db implementation. Its great to see such reuse of code. However, it s not really a complete transition yet as I ve had cause to find out today. I ve blogged before about performance with the sqlite summary sqlite database. Today I was greet with a crash-on-startup bug which happily has a patch upstream already. Before I looked in the bug tracker though, I did some house cleaning. I started with a 900MB folders.db. Doing a vacuum on the db dropped that to 300MB. It doesn t appear to be something that evolution does itself. Firefox too appears to lack an automatic vacuum. sqlite is an embedded database, and its wonderful at doing that, but its not as install-and-forget as (say) PostgreSQL which does autovacuum. So an additional tip is vacuum your folders, e.g. with http://www.gnome.org/~sragavan/evolution-rebuild-summarydb, a helper script that will run vacuum on all your account summary db s. Note that it *does not rebuild*, it solely vacuums, and as such does not add or delete (modulo bugs in sqlite) data to the summary database. After the housecleaning, I checked that the sqlite database was in good condition:

sqlite3 folders.db
pragma integrity_check;

This returned a number of indexing issues, so I reindexed:

reindex;

Evolution now starts up and crashes in a fraction of a second - a big improvement. Finally, I started looking at the evolution code as I now was fairly confident it was a bug - it was in a sqlite callback function - and the column the function extracts data from (flags) is missing a NOT NULL constraint, but the code doesn't check for NULL - boom. From there to finding the bug report and existing patch was trivial. And this is where my comment on reliability turns up: Evolution doesn't anticipate NULL flag values in its code, so why does it insert them into the database at all ? I suspect its due to some aspect of the incremental conversion to using sqlite summaries. More concerning for me is the possibility that there are many other such crash bugs lurking in the new sqlite based code. There are possibly some clues as to the excessive table scans done by evolution in the use of a flags bitset rather than separate columns, but I haven't looked close enough to really say.

23 December 2009

Robert Collins: bzr selftest uses testtools

the emperor has new clothes bzr has just changed the base class for its test suite from unittest.TestCase to testtools.TestCase . This change has cleaned up a bunch of test logic, deleted a significant amount of code (much of which was redundant with Python unittest) and added some useful and important features. bzr has only been able to make this change due to testtools expanding its mission from a simple aggregation of proven unittest extensions into one where new extensions that *make unittest more extensible*. My deepest thanks to Jonathan for permitting me to use testtools as the vehicle to put these extension-enabling-extensions (and for his patience in reviewing said changes!). The change was pretty easy: The bulk of the changes were in bzrlib.tests and bzrlib.tests.test_selftest. I chose to cleanup an ugly API at the same time which added a little scattershot across a number of tests. And there are more changes that can be done to take better advantage of testtools the amount of deleted and cleaned up code isn t complete. Even so, its a pretty clear win:

18 files changed, 228 insertions(+), 496 deletions(-)

What went? bzr had an implementation of TestCase.run. This function is the main workhorse of Python s unittest module, and yet sadly it has to be replaced to change the exceptions that can be raised(to signal new outcomes), or to improve on test cleanup. Testtools provides an API to permit registering new exception types and handlers for them. Like python 2.7 testtools also provides the TestCase.addCleanup API, and these two things combined mean that bzr no longer needs to reimplement the run method. For expected failures, bzr uses a helper method TestCase.expectFailure to perform an existing assertion and convert the test into an expected failure if that assertion does not trigger. This was another feature testtools already provides and thus got deleted. All the custom code for skipping and expected failures got deleted, and the other outcomes bzr uses turned into extensions (as per the run discussion above). In bzr test cases generate a log (because bzr generates a log) and previously the TestResult in bzrlib inspected each test that had been executed to extract the log. This was made simpler by using the details API that testtools provides (see testtools.TestCase.addDetail), which permits tests to add arbitrary data in a semi-structured fashion. This is supported by subunit and a long standing bug with bzr selftest --parallel was fixed as a result logs from tests run in other processes are now carried across the process barrier intact and are presented cleanly. Some other minor cleanups are in unittest compatibility code, where bzr would degrade gracefully with unittest runners, and testtools provides such logic comprehensively, so all that got deleted too. Whats new? I think the most significant new facility that testtools offers bzrlib is assertThat. This assertion is inspired by the very nice assertThat in JUnit (which has changed substantially since Python s unittest was written based on it). This assertion separates the two concerns of raise an exception and decide if an exception should be raised . The separation allows for better reuse of custom checking code, because it permits composition in a cleaner way than extra assertion methods permit. Testtools does not include many matchers as yet, but matchers are easy to write, and if one were to write a small adapter to the hamcrest library, there are a bunch of ready made matchers there (though they have a very Java feel such as is not meaning is which is why Testtools did not use that library). Secondly, the addDetail API referenced above, in combination with testtools.TestCase.addOnException will permit capturing the entire working area when a test fails, something that developers currently have to fiddle about with breakpoints to achieve. This hasn t been done, but is a straight forward patch I hope to do in the new year. Lastly, Testtools offers testtools.TestCase.getUniqueInteger and testtools.TestCase.getUniqueString, which are not as yet used in bzr tests, but we may start using them soon. Beyond that, the other features of testtools are already present in bzrlib, and we simply need to find and delete more duplicated code.

20 December 2009

Robert Collins: Various releases

Recently I ve been working on the Python unittest API in my spare time, with a long term goal of making it possible to safely and sensibly glue many different plugins together into the core. Two important components of that goal are being able to extend the data included in a test result, and being able to change how a test is run (such as adding new exceptions that should be treated as specific outcomes python unittest uses exceptions to signal outcomes). In testtools 0.9.2 we have an answer to both those issues. I m really happy with the data included in outcomes API, TestCase.addDetail . The API for extending outcomes works, but only addresses part of that issue for now. Subunit 0.0.4, which is available for older Ubuntu releases in the Subunit releases PPA now, and mostly built on Debian (so it will propogate through to Lucid in due course) has support for the addDetail API. Subunit now depends on testtools, reducing the non-protocol related code and generally making things simpler. Using those two together, bzr s parallelised test suite has been improved as well, allowing it to include the log file for tests run in separate processes (previously it was silently discarded). The branch to do this will be merged soon, its just waiting on some sysadmin love to get these new versions into its merge-test environment. This change also provides complete capturing of the log when users want to supply a subunit log containing failed tests. The python code to do this is pretty simple:

def setUp(self):
    super(TestCase, self).setUp()
    self.addDetail("log", content.Content(content.ContentType("text", "plain",
         "charset": "utf8" ), lambda:[self._get_log(keep_log_file=True)]))

I ve made a couple of point releases to python-junitxml recently, fixing some minor bugs. I need to figure out how to add the extra data that addDetails permits to the xml output. I suspect its a strict superset and so I ll have to filter stuff down. If anyone knows about similar extensions done to junit s XML format before, please leave a comment

19 December 2009

Robert Collins: Debianising with bzr-builddeb

Bzr build-deb is very nice, but it can be very tricky to get started. I recently did a fresh debianisation of a project that is in bzr upstream, and I thought I d record the recipe to make it work (at least until the various bugs making it hard re fixed). Assuming that the upstream uses bzr, it goes like this:

Start with a branch that is close to the code you want to Debianise. E.g. if the release was off trunk, 3 commits back: bzr branch trunk -r -3 debian
Debianise as normal: put the tarball with the right name in the parent dir, add a debian directory and fiddle until you build a package you re happy with. Don t commit while doing this.
Build a source package- debuild -S, or bzr builddeb -S
Revert your changes bzr revert.
Import the dsc bzr import-dsc ../*.dsc
Now, you may find that some dot files, such as .bzrignore have been discarded inappropriately (there is a bug open on this). If that happened, keep going. Otherwise, you re done: you can now use merge-upstream on future upstream releases, and debcommit etc.
bzr uncommit
bzr revert .bzrignore (and any other files that you want to get back)
debcommit
All done, see point 6 for details.

Hope-this-helps

18 December 2009

Joey Hess: upstreams and packaging

As distributors we should not discourage upstreams that wish to generate binary packages themselves, rather we should cooperate with them, and ideally they will end up maintaining their stable release packages in our distributions.

-- Robert Collins I agree. Robert goes on to talk about the tendancy Debian (and apparently also Ubuntu) have to dislike upstream providing a debian directory. Funny thing is that we also like to say team maintenance is a good idea. An upstream who is doing packaging work is a readymade half of a team; if you write them telling them to rm -rf debian, you are turning away a team member. With a slap in the face. Worse, it's a team member who has demonstrated that they are capable of working in far more complex systems than a debian directory, since they wrote/maintain the software itself. BTW, for those who are skeptical of teams, on the basis that they dilute responsibility: A team consisting of an upstream developer and a distribution maintainer is inherently the more healthy sort of team, where each member has a well-defined area of expertise, but can also venture outside their area when needed. When I packaged FBReader for Debian, upstream already had their own packaging, and I worked with them to fix problems with it and make it something I could be happy maintaining. This was sometimes tricky, since upstream was also maintaining packages for maemo. Sometimes the tools were not ideal. It was still worth it. On the other side of the coin, d-i is an upstream for several distributions. If they told us we needed to rm -rf debian, we'd not have a lot of d-i left. tools I also agree with Robert that most of the trouble comes down to problems with tools. BTW, RPM does not have these problems, and in that world it's typical for upstream to provide a spec file. Much of the problem comes down to the crummyness of dpkg's source format, which cannot rename files, delete files, etc. That the source format directs us to 1970's source management (ie, tarballs and patches), instead of 21st century source management (ie, DVCS), doesn't help either. dpkg's 3.0 quilt source format tries to address the issue by removing any debian directory in upstream's tarball. I am not sure that this is at all the right approach; it makes it harder, not easier to work with upstream as a team. A better approach might be to consider anything that hardcodes the debian directory as having a bug. If upstream can easily package debs using a deb, or maemo, or ubuntu directory, you sidestep any potential conflict while still being able to work with them via a symlink. To put my time where my mouth is: Anytime debhelper prevents working with an upstream who wants to ship a debian directory; that is a bug. I will fix them. (Note that debhelper already provides a --ignore that can be used if upstream has provided a dehelper control file that you can't delete and don't want used.) Previously: Look who's packaging

Robert Collins: Why upstreams should do distribution packaging

Software comes in many shapes and styles. One of the problems the author of software faces is distributing it to their users. As distributors we should not discourage upstreams that wish to generate binary packages themselves, rather we should cooperate with them, and ideally they will end up maintaining their stable release packages in our distributions. Currently the Debian and Ubuntu communities have a tendancy to actively discourage this by objecting when an upstream software author includes a debian/ directory in their shipped code. I don t know if Redhat or Suse have similar concerns, but for the dpkg toolchain, the presence of an upstream debian directory can cause toolchain issues. In this blog post, I hope to make a case that we should consider the toolchain issues bugs rather than just-the-way-it-is, or even features. To start at the beginning, consider the difficulty of installing software: the harder it is to install a piece of software, the more important having it has to be for a user to jump through hoops to install it. Thus projects which care about users will make it easy to install and there is a spectrum of ease. At one end,

checkout from version control, install various build dependencies like autoconf, gcc and so on

through to

download and run this installer

Now, where some software authors get lucky, is when someone else makes it easy to install their software, they make binary packages, so that users can simply do

apt-get install product

Now some platforms like MacOSX and Microsoft Windows really do need an installer, but in the Unix world we generally have packaging systems that can track interdependencies between libraries, download needed dependencies automatically, perform uninstalls and so on. Binary packaging in a Linux distribution has numerous benefits including better management of security updates (because a binary package can sensibly use shared libraries that are not part of the LSB). So given the above, its no surprise to me to see the following sort of discussion on #ubuntu-motu:

upstream> Hi, I want to package product.
developer> Hi, you should start by reading the packaging guide
(upstream is understandably daunted the packaging guide is a substantial amount of information, but only a small fraction is needed to package any one product.)

or (less usefully)

upstream> Hi, I want to package product.
developer> If you want to contribute, you should start with existing bugs
upstream> But I want to package product.

Another conversation, which I think is very closely related is

developer> Argh, product has a debian dir, why do they do this to me?!

The reasons for this should be pretty obvious at this point:

Folk want to make their product easy to install and are not themselves DD s, DM s or MOTU s.
So they package it privately such as in a PPA, or their own archive.
When they package it, they naturally put the packaging rules in their source tree.

Now, why should we encourage this, rather than ask the upstream to delete their debian directory? Because it lets us, distributors, share the packaging effort with the upstream. Upstreams that are making packages will likely be doing this for betas, or even daily builds. As such they will find issues related to new binaries, libraries and so on well in advance of their actual release. And if we are building on their efforts, rather than discarding them, we can spend less time repeating what they did and more packaging other things. We can also encourage the upstream to become a maintainer in the distro and do their own uploads: many upstreams will come to this on their own, but by working with them as they take their early steps we can make this more likely and an easier path.

19 October 2009

Robert Collins: Government data please do it right

The Australian government 2.0 taskforce has an initiative to make data available for public remixing and use: after all its public property anyway, right? They have even run a mashup competition. Notably missing from the excellent collection of data that has been opened is the NSW Transport and Infrastructure dataset for public transport in NSW. There is a similar dataset for the Northern Territory in the mashup transport section. The NT dataset is under the fantastic cc-by licence. You can write an iphone app with this, a journey planner that you can cart with you while disconnected; a find the closest bus I can walk to tool, or well let the imagination run wild. The NSW dataset is under a heavily restrictive license. Its so restrictive I m not sure its feasible to write an open source tool using its data. The meta-issue is that NSW T&I department wants control over the applications built with this data. This adds a tremendous chilling effect on potential uses of the data: the department will have to approve, with a long lead time, every use of the data, and get to tell the application developer what to changes to make to their application. I strongly doubt that a simple remixing of the data (e.g. with weather reports to prefer buses on very wet day) would be permitted, as it would allow other users to just read the remix and get the original data /without entering into a license agreement/. I m sure there is some unstated risk of openess, or benefit of control, that is shaping this problematic approach. Whatever the cause, its not open at all. Given that the overall approach is fundamentally flawed, a blow by blow analysis of the custom license isn t particularly useful, however I thought I would pick some highlights out to save folk the trouble

The dataset is behind a username/password wall [that you cannot share with others].
Licensees may not be private everyone must know you re using the data.
You must link to the 131500.com.au website
You may not charge users for an app that has to be redeveloped if the dataset changes shape
Any application written to use the dataset must be given to the department 30 days before release to the public.
The department gets to suggest changes to any announcement related to the developers app, the license agreement or the dataset.
The dataset is embargoed you cannot share it with others.
The use of the dataset has to be logged and reported.
There is a restraint of use in there as well related to Inappropriate and Offensive Material. It wouldn t affect me, but sheese, given all the other restraints its hardly needed.

There are more gems in the details, but in short: The department will control what, where, when and how (the data is accessed, the application s functionality/appearance, how it was used). Hell, the 30 day requirement alone makes for slow delivery of whatever someone wants to build. I really hope this can be improved on.

3 October 2009

Robert Collins: Subunit-0.0.3

Subunit 0.0.3 should be a great little release. Its not ready yet, but some key things have been done. Firstly, its been relicensed under BSD/Apache version 2. This makes using Subunit with other test frameworks much easier, as those frameworks tend to be permissive licenses such as the LGPL, BSD or Apache. Thanks go out to the contributors to Subunit who made this process very painless. Secondly, the C client code is getting a few small touch ups, probably not enough to reach complete feature parity with the Python reporter. Thirdly, the CPPUnit patch that Subunit has carried for ages has been turned into a small library built by Subunit, so you ll be able to just install that into an existing CPPUnit environment without rebuilding CPPUnit. Lastly, but most importantly it will have hopefully the last major protocol change (still backwards compatible!) needed for 1.0 the ability to attach fairly arbitrary debug data in an outcome (things like stdout , stderr , a log file X and so forth). This will be used via an experimental object protocol the one I proposed on the Testing In Python list. I should get the protocol changes done on the flight to Montreal tomorrow, which would be a great way for me to get my mind fully focused on testing for the sprint next week.

22 September 2009

Robert Collins: Python unittest API : Time to fix it

So, for ages now I ve been saying that unittest is, at its core, pretty sound. I incited a talk to this effect. I have a vision; I dream of a python testing library that:

Is in the python core
Is simple
Is extensible
Has tests take care of testing
Has results take care of reporting
Aids communication from test to test reader

Hopefully those are pretty modest and agreeable things to want. However we don t have this: nose is lovely but not in the core [and is a moderately complex API]. py.test is also not in the core, and has previously tripped my too-much-magic alerts. I must admit to not having checked if this is fixed yet. unittest itself is in the core but has some cruft which we should clean up, but more importantly is not extensible enough, which leads to extensions such as the zope testrunner having to muddy the waters between testing and reporting. The point Aids communication from test to test reader is worth expanding on: automated testing is something that doesn t need observation until the unexpected happens. At that point some poor schmuck such as you or I ends up trying to guess what went wrong. The more data that we gather and communicate about the event, the greater the chance it can be corrected without needing a repeat run under a debugger, or worse, single stepping through the code. There is a problem with assertFoo methods in unittest, something that I m not going to cram into this blog post. I will say, if you find the tendency of such methods to crawl to the base class frustrating, that you should look at hamcrest it and similar things have been very successful in the Java unit testing world; we can learn from them. Going back to my vision, we need to make unittest more powerfully extensible to allow projects like nose to do all the cool things they want to while still being unittest compatible. I don t mean that nose can t run unittest tests; I mean that unittest can t run nose tests: nose has had to expand the contract, not simply add implementations that do more. To that end I have a number of bugs which I need to file. Solving them piecemeal will create a fractured API particularly if this is done over more than one release. So I am planning on prototyping in other projects, discussing like mad on the testing-in-python list, and when it all starts to come together writing up a PEP. The bugs I have are:

streams nicely: countTestCases must die/be made optional. This function is inherently incompatible with generative tests or anything beyond the simplest lightweight environments
no way to wrap code around a single test. This would permit profiling, debugging, tracing, and I m sure other things more cleanly. (At the moment, one must turn on the profiler in startTestCase, and turn it off in stopTestCase. This is much more awkward than simply being in the call stack). Some care will be needed here, particularly for generative tests.
code that isn t part of the implementation in the core needs to be able to work with the reporting code; allowing an optionally wider API permits extensions to be debuggable. This needs thought: do we allow direct access to TestResults? Do we come up with some added level of indirection and events ? I don t know.
More data than just the backtrace needs to be included when an outcome is reporter. I ve started a discussion on the testing in python list about this. I m proposing that we use a dict of named content objects, and use the HTTP content-type abstraction to make the content objects introspectable and reliably handleable without tying the unittest object protocol to any given wire format loose coupling is good!
The way we signal outcomes between TestCase and TestResult the addFailure etc methods is concerning: there are many grades of outcome that users of the framework may usefully wish to represent; in fact there are more than we probably want to put in the core. Finding a way to decouple the intent of a particular outcome from how its signalled would allow users more control while still being able to use the core framework. One particular issue in this area is that its possible with the current API to have a single test object succeed multiple times. Or fail (addFailure) then succeed (addSuccess). This causes no end of confusion, as test counts can mismatch failure counts, and so on.

I ve got some ideas about these bugs, but I m approaching a kiloword already, and I hope this post has enough to provoke some serious thought about how we can fix these 5 bugs, compatibly, and end up with a significantly better unittest module. We ll have been sucessful if projects like Trial, nose and the zope testrunner are able to remove all their code that duplicates standard library functionality or otherwise worksaround these bugs, and can instead focus on adding the specific test support needed by their environments (in the Trial and zope cases), or on UI and plug-n-play (for nose).

20 September 2009

Robert Collins: Packaging backlog

Got some of my packaging backlog sorted out:

bicyclerepairman updated for the vim policy (which means it works again!)
python-testtools (a simple migration of the package to Debian)
subunit 0.0.2 released upstream and packaged for Debian.
testresources 0.2 -> Debian.

And a small memo-to-self: On all new machines, echo filetype plugin on >> ~/.vimrc

16 September 2009

Robert Collins: Back from hiatus

Well, the new blog seems to be up and running and gathering modest numbers of comments already. Woo. I ve a bunch of mail about test suite performance to gather and refine into a follow up post, but that can wait a day or two. In bzr we suffer from a long test suite, which we let grow while we had some other very pressing performance concerns. 2.0 fixes these concerns, and we re finding the odd moment to address our development environment a little now. One of the things I want to do is to radically reduce the cost of testing inside bzr; code coverage is a great way to get a broad picture of what is tested. Rather than reinvent the wheel (and I ve written one to throw away, so far) are there tools out there that can:

build a per-test coverage map
do it quickly
don t include setUp/tearDown/cleanUp code paths in the map
report on the difference between two such maps (at the suite level)

The third point is possibly contentious, so I should expand on it. While code that is executed by code within the test s run() method is from the outside all part-of-the-test, its not (by definition) the focus of the test. And I find focused tests substantially easier to analyse failures in, because they tend to check preconditions, poke at object state etc. As I want this coverage map to help preserve coverage as we refactor the test suite, I don t want to include accidental coverage in the map.

Next.

Previous.